Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add C# deterministic serialization #13160

Conversation

fmg-lydonchandra
Copy link
Contributor

@fmg-lydonchandra fmg-lydonchandra commented Jun 28, 2023

#12881

@jskeet review please :-) as this is my first contrib to protobuf, any design / performance feedback is very sought after

@google-cla
Copy link

google-cla bot commented Jun 28, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@@ -452,7 +452,17 @@ public void WriteTo(CodedOutputStream output, Codec codec)
WriteContext.Initialize(output, out WriteContext ctx);
try
{
WriteTo(ref ctx, codec);
if (output.IsSerializationDeterministic())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I unit-test this ? Is this for old proto2 format ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by that. Is what for proto2 format?

@@ -691,6 +691,54 @@ public void OneofSerialization_DefaultValue()
});
}

[Test]
public void MapSerializationDeterministicTrue_ThenBytesIdentical()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add more tests for different types of keys, once the general implementation approach is approved.

@fmg-lydonchandra fmg-lydonchandra marked this pull request as ready for review June 28, 2023 07:00
@fmg-lydonchandra fmg-lydonchandra requested a review from a team as a code owner June 28, 2023 07:00
@fmg-lydonchandra fmg-lydonchandra requested review from jskeet and removed request for a team June 28, 2023 07:00
@jskeet
Copy link
Contributor

jskeet commented Jun 28, 2023

I'll get to this when I can - it may not be this week though.

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various suggestions, but I'm happy with the overall approach.

/// maps are sorted on the lexicographical order of the UTF8 encoded keys.</description></item>
/// </list>
/// </remarks>
public void UseDeterministicSerialization() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a property:

public bool DeterministicSerialization { get; set; }

I'm not hugely keen on the stream being mutable like this, but it appears to be the design in other languages.

(Naming could be IsDeterministic or just Deterministic... I'm not hugely bothered on that front.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to property: Deterministic

@@ -452,7 +452,17 @@ public void WriteTo(CodedOutputStream output, Codec codec)
WriteContext.Initialize(output, out WriteContext ctx);
try
{
WriteTo(ref ctx, codec);
if (output.IsSerializationDeterministic())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by that. Is what for proto2 format?

/// change in the future)
/// <list type="bullet">
/// <item><description>Sort map entries by keys in lexicographical order or numerical order. Note: For string
/// keys, the order is based on comparing the Unicode value of each character in the strings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Unicode character, or each UTF-16 code unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to UTF-16 code unit

}
else
{
WriteTo(ref ctx, codec, list);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better not to duplicate the WriteTo call here. Something like:

IList<KeyValuePair<TKey, TValue>> listToWrite = list;
if (output.IsDeterministic)
{
    listToWrite = new List<KeyValuePair<TKey, TValue>>(list);
    listToWrite.Sort(...);
}
WriteTo(ref ctx, codec, listToWrite);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to call GetSortedListCopy

@@ -469,7 +479,24 @@ public void WriteTo(CodedOutputStream output, Codec codec)
[SecuritySafeCritical]
public void WriteTo(ref WriteContext ctx, Codec codec)
{
foreach (var entry in list)
if (ctx.state.CodedOutputStream is not null && ctx.state.CodedOutputStream.IsSerializationDeterministic())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be:

if (ctx.state.CodedOutputStream?.IsDeterministic == true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to if (ctx.state.CodedOutputStream?.Deterministic ?? false)

// We can't sort the list in place, as that would invalidate the linked list.
// Instead, we create a new list, sort that, and then write it out.
var sorted = new List<KeyValuePair<TKey, TValue>>(list);
sorted.Sort((pair1, pair2) => Comparer<TKey>.Default.Compare(pair1.Key, pair2.Key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the repetition here, I'd probably extract the logic into a method of GetSortedListCopy or similar. Otherwise if we tweaked how the comparer is set up, it could end up inconsistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to GetSortedListCopy

if (output.IsSerializationDeterministic())
{
var sorted = new List<KeyValuePair<TKey, TValue>>(list);
sorted.Sort((pair1, pair2) => Comparer<TKey>.Default.Compare(pair1.Key, pair2.Key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Comparer.Default begs the question as to whether the default string comparison is ordinal or not. I believe it is, but I'm not entirely sure.

I'd at least add a comment here - and possibly also note that map keys are guaranteed to be bools, integers or strings; never bytestrings or messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is NOT ordinal by default, explicitly using StringComparer.Ordinal now

@mkruskal-google mkruskal-google added c# 🅰️ safe for tests Mark a commit as safe to run presubmits over labels Jun 28, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jun 28, 2023
@jskeet
Copy link
Contributor

jskeet commented Jun 29, 2023

Thanks for the additional commit - I'll review on Monday.

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this looks good to me. Will rerun tests.

@@ -160,6 +160,33 @@ public long Position
}
}

/// <summary>
/// Configures serialization to be deterministic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "Configures whether or not serialization is deterministic." (It's a property, but the current text sounds sounds like it's a method.)

/// The deterministic serialization guarantees that for a given binary, equal messages (defined by the
/// equals methods in protos) will always be serialized to the same bytes. This implies:
/// <list type="bullet">
/// <item><description>Repeated serialization of a message will return the same bytes</description></item>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add period

/// Configures serialization to be deterministic.
/// </summary>
/// <remarks>
/// The deterministic serialization guarantees that for a given binary, equal messages (defined by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "The deterministic serialization" => "Deterministic serialization"

@jskeet jskeet added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 3, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 3, 2023
@jskeet jskeet added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 3, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 3, 2023
@jskeet
Copy link
Contributor

jskeet commented Jul 3, 2023

The tests appear to be failing in CI - I'll take a look.

@jskeet jskeet added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 3, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jul 3, 2023
copybara-service bot pushed a commit that referenced this pull request Sep 22, 2023
#12881

@jskeet  review please :-) as this is my first contrib to protobuf, any design / performance feedback is very sought after

Closes #13160

COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b
PiperOrigin-RevId: 567635727
copybara-service bot pushed a commit that referenced this pull request Sep 26, 2023
#12881

@jskeet  review please :-) as this is my first contrib to protobuf, any design / performance feedback is very sought after

Closes #13160

COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b
PiperOrigin-RevId: 567635727
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants